What the app takes as input

  • A single cell dataset pre-clustered with either RaceID3, Monocle2 or Seurat3 and saved in a serialized R object.
  • The dataset can either be produced by the snakePipes scRNAseq workflow or by custom analysis as long as all the slots are populated.

What the app can do

  • Visualize the original clusters on a tsne map
  • Allow the user to specify the desired number of clusters, re-cluster the dataset and visualize the updated clusters on a tsne map
  • Extract cluster marker genes (up to 10 per cluster), summarize them in a table as well as visualize on a heatmap
  • Allow the user to provide 1 or more gene IDs to visualize the expression for on a tsne map
  • Calculate and list top10 most correlated genes to the genes provided by the user
  • Plot pairwise expression for genes provided by the user

How the app works

  • The user or uploads an RDS or an RData file not exceeding 500Mb.
  • A single cell object formatted and preprocessed with the respective package is loaded into memory.
  • The processing is done either using RaceID3 functions or using a combination of Monocle2 (clustering, visualization) and Seurat3 (marker gene extraction and visualization) functions, or only Seurat3. In case of RaceID3 outlier detection is performed but ignored for visualization, such that only original clusters are considered.

A note on processing speed

  • The more data is loaded, the longer (some of) the functions will take - the app is not tested with very large datasets! Performance cannot be guaranteed > 5000 cells.
  • Data loading is rather slow, please allow up to a minute, depending how large your dataset is. Depending on the package of choice, re-clustering or extracting cluster markers might also be a slow calculation.

Dataset selection

Select R package that you’d like to conduct the analysis with from the “Select R package” pulldown list.

To upload a dataset, use the ‘Browse’ button in the “Choose file to upload” field.

Hint: the dataset must be fully processed and contain the initial clustering information!
Hint2: for RaceID, the dataset must be preprocessed with version 3 of the package. For Monocle, with version 2. For Seurat, use CRAN-released version 3.

This vignette showcases the use of a dataset from a custom path.

Example1: RaceID

A published dataset stored under “/data/processing/scRNAseq_shiny_app_example_data/GSE81076_raceid.workspaceR/sc.minT1000.RData” will be analyzed. See Grün D, Muraro MJ, Boisset JC, Wiebrands K et al. De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data. Cell Stem Cell 2016 Aug 4;19(2):266-277 for the original publication.

Select RaceID3 as analysis R package. Upload the dataset (wait until complete) and click on ‘Select dataset’ (Figure 1).

Figure  1: R package selection and dataset upload

Figure 1: R package selection and dataset upload

After some lag, the head of the normalized data appears in the “Input Data” tab (Figure 2). You can also check the dimensions of your matrix and the summary of the TPC (transcript per cell) distribution in the corresponding boxes.

Figure  2: Head of normalized counts and data summary

Figure 2: Head of normalized counts and data summary

In the “Cell map and clustering” tab, a cluster membership tsne plot for the preselected number of clusters is displayed for the loaded dataset (Figure 3). A plot of the within-cluster dispersion as a function of cluster number will appear in the “Metrics for cluster number selection” box, anda silhoutte plot illustrating cluster assignment quality alongside it(Figure 4).

Use this information to guide your cluster number choice as described in the package vignette.
The dataset was originally clustered into 6 clusters.

Figure  3: Cluster plots for uploaded dataset

Figure 3: Cluster plots for uploaded dataset

Figure  4: Cluster quality metrics for loaded dataset

Figure 4: Cluster quality metrics for loaded dataset

You decide to change the number of clusters to e.g. 3. Update the value on the ruler and click on ‘Update cluster plots’. This initiates re-clustering (Figure 5), and after a waiting time, the updated tsne and silhouette plots replace the old plots (Figure 6,Figure 7).

Figure  5: Update cluster number choice

Figure 5: Update cluster number choice

Figure  6: Cluster plots for updated cluster number

Figure 6: Cluster plots for updated cluster number

Figure  7: Cluster quality metrics for updated cluster number

Figure 7: Cluster quality metrics for updated cluster number

To obtain markers (by default: 2) for each cluster, click on ‘Get marker genes’ in the ‘Marker Gene Calculation’ page (Figure 8). After a (rather long) while, a table with top markers as well as a heatmap corresponding to it appears (Figure 9).

Figure  8: Request top marker genes

Figure 8: Request top marker genes

Figure  9: Top marker genes result

Figure 9: Top marker genes result

To increase the number of markers displayed in the table and on the heatmap, move the ruler above the table. The two outputs will be updated (Figure 10).

Figure  10: Update the number of marker genes displayed

Figure 10: Update the number of marker genes displayed

You can download the marker table, use the ‘Download table’ button (Figure 11).

Figure  11: Download cluster marker table

Figure 11: Download cluster marker table

In the “Marker Gene Visualization” tab, you may plot expression of selected genes, as long as they are expressed in at least 1 cell in the dataset. To select a gene, copy one of the top markers into the “GeneID” field in the box and click on ‘Select genes’ (Figure 12).

Figure  12: Select gene IDs for visualization

Figure 12: Select gene IDs for visualization

Check that the gene(s) is(are) expressed in the ‘Genes used’ field (Figure 12).

Modify plot title and expression scale if needed, and click on ‘Plot cell map’ to visualise gene expression for that gene(s) (Figure 13).

Figure  13: Tsne map with marker gene expression

Figure 13: Tsne map with marker gene expression

In the “Correlation Analyses” tab, you may query your dataset for the genes most correlated to your genes of interest and obtain pairwise gene expression plot. Again, enter a gene ID in the side box and click on “Select genes” button in this tab (Figure 14).

Figure  14: Select gene IDs for correlation analysis

Figure 14: Select gene IDs for correlation analysis

A violin plot of the pearson correlation calculated for log2-transformed counts will appear, alongside a list of top10 genes with the highest absolute correlation to the selected genes (Figure 15).

Figure  15: Display top correlated genes

Figure 15: Display top correlated genes

To plot pairwise correlation for selected genes, enter gene IDs into the boxes collecting information for X and Y axes in the bottom half of the page, adjust the plot title if necessary, and click on the “Plot expression” button (Figure 16).

Figure  16: Select gene IDs for pairwise expression plot

Figure 16: Select gene IDs for pairwise expression plot

Pairwise plot of normalized counts will appear (Figure 17).

Figure  17: Pairwise expression plot

Figure 17: Pairwise expression plot

Example2: Monocle

A published dataset stored under “/data/processing/scRNAseq_shiny_app_example_data/GSE81076_monocle.workspaceR/minT5000.mono.set.RData” will be analyzed. See Grün D, Muraro MJ, Boisset JC, Wiebrands K et al. De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data. Cell Stem Cell 2016 Aug 4;19(2):266-277 for the original publication.

Select Monocle2 as analysis package. Upload dataset (wait till complete) and click on ‘Select dataset’ (Figure 18).

Figure  18: R package selection and dataset upload

Figure 18: R package selection and dataset upload

After some lag, the head of the normalized data appears in the “Input Data” tab (Figure 19). You can also check the dimensions of your matrix and the summary of the TPC (transcript per cell) distribution in the corresponding boxes.

Figure  19: Head of normalized counts and data summary

Figure 19: Head of normalized counts and data summary

In the “Cell map and clustering” tab, a cluster membership tsne plot for the preselected number of clusters is displayed for the loaded dataset (Figure 20). A plot of delta (distance) versus rho (density) will appear in the “Metrics for cluster number selection” box, and a silhoutte plot illustrating cluster assignment quality alongside it (Figure 21).

Use this information to guide your cluster number choice as described in the package vignette.
The dataset was originally clustered into 17 clusters.

Figure  20: Cluster plots for uploaded dataset

Figure 20: Cluster plots for uploaded dataset

Figure  21: Cluster quality metrics for loaded dataset

Figure 21: Cluster quality metrics for loaded dataset

You decide to change the number of clusters to e.g. 3. Update the value on the ruler and click on ‘Update cluster plots’. This initiates re-clustering (Figure 22), and after a waiting time, the updated tsne and silhouette plots replace the old plots (Figure 23,Figure 24).

Figure  22: Update cluster number choice

Figure 22: Update cluster number choice

Figure  23: Cluster plots for updated cluster number

Figure 23: Cluster plots for updated cluster number

Figure  24: Cluster quality metrics for updated cluster number

Figure 24: Cluster quality metrics for updated cluster number

To obtain markers (by default: 2) for each cluster, click on ‘Get marker genes’ in the ‘Marker Gene Calculation’ page (Figure 25). After a (rather long) while, a table with top markers as well as a heatmap corresponding to it appears (Figure 26). This calculation is done using the Bioconductor scRNAseq analysis package ‘Seurat’.

Figure  25: Request top marker genes

Figure 25: Request top marker genes

Figure  26: Top marker genes result

Figure 26: Top marker genes result

To increase the number of markers displayed in the table and on the heatmap, move the ruler above the table. The two outputs will be updated (Figure 27).

Figure  27: Update the number of marker genes displayed

Figure 27: Update the number of marker genes displayed

You can download the marker table, use the ‘Download table’ button (Figure 28).

Figure  28: Download cluster marker table

Figure 28: Download cluster marker table

In the “Marker Gene Visualization” tab, you may plot expression of selected genes, as long as they are expressed in at least 1 cell in the dataset. To select a gene, copy one of the top markers into the “GeneID” field in the box and click on ‘Select genes’ (Figure 29).

Figure  29: Select gene IDs for visualization

Figure 29: Select gene IDs for visualization

Check that the gene(s) is(are) expressed in the ‘Genes used’ field (Figure 30).

Modify plot title and expression scale if needed, and click on ‘Plot cell map’ to visualise gene expression for that gene(s) (Figure 30).

Figure  30: Tsne map with marker gene expression

Figure 30: Tsne map with marker gene expression

In the “Correlation Analyses” tab, you may query your dataset for the genes most correlated to your genes of interest and obtain pairwise gene expression plot. Again, enter a gene ID in the side box and click on “Select genes” button in this tab (Figure 31).

Figure  31: Select gene IDs for correlation analysis

Figure 31: Select gene IDs for correlation analysis

A violin plot of the pearson correlation calculated for log2-transformed counts will appear, alongside a list of top10 genes with the highest absolute correlation to the selected genes (Figure 32).

Figure  32: Display top correlated genes

Figure 32: Display top correlated genes

To plot pairwise correlation for selected genes, enter gene IDs into the boxes collecting information for X and Y axes in the bottom half of the page, adjust the plot title if necessary, and click on the “Plot expression” button (Figure 33).

Figure  33: Select gene IDs for pairwise expression plot

Figure 33: Select gene IDs for pairwise expression plot

Pairwise plot of normalized counts will appear (Figure 34).

Figure  34: Pairwise expression plot

Figure 34: Pairwise expression plot

Example3: Seurat

A published dataset stored under “/data/processing/scRNAseq_shiny_app_example_data/GSE75478_seuset.umap.RDS” will be analyzed. See Velten L, Haas SF, Raffel S, Blaszkiewicz S et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat Cell Biol 2017 Apr;19(4):271-281 for the original publication.

Select Seurat3 as analysis package. Upload dataset (wait till complete) and click on ‘Select dataset’ (Figure 35).

Figure  35: R package selection and dataset upload

Figure 35: R package selection and dataset upload

After some lag, the head of the normalized data appears in the “Input Data” tab (Figure 36). You can also check the dimensions of your matrix and the summary of the TPC (transcript per cell) distribution in the corresponding boxes.

Figure  36: Head of normalized counts and data summary

Figure 36: Head of normalized counts and data summary

In the “Cell map and clustering” tab, a cluster membership tsne plot for the preselected number of clusters is displayed for the loaded dataset (Figure 37). A clustree plot will appear in the “Metrics for cluster number selection” box, as long as at least two cluster assignment columns are available in the data. Otherwise, the box will remain blank (Figure 38). A silhoutte plot illustrating cluster assignment quality is displayed alongside it.

Figure  37: Cluster plots for uploaded dataset

Figure 37: Cluster plots for uploaded dataset

Figure  38: Cluster quality metrics for loaded dataset

Figure 38: Cluster quality metrics for loaded dataset

Use this information to guide your cluster number choice as described in the package vignette.
The dataset was originally clustered into 8 clusters .

You decide to change the resolution to e.g. 0.2. Update the value on the ruler and click on ‘Update cluster plots’. This initiates re-clustering (Figure 39), and after a waiting time, the updated tsne, clustree and silhouette plots replace the old plots (Figure 40,Figure 41).

Figure  39: Update cluster number choice

Figure 39: Update cluster number choice

Figure  40: Cluster plots for updated cluster number

Figure 40: Cluster plots for updated cluster number

Figure  41: Cluster quality metrics for updated cluster number

Figure 41: Cluster quality metrics for updated cluster number

To obtain markers (by default: 2) for each cluster, click on ‘Get marker genes’ in the ‘Marker Gene Calculation’ page (Figure 42). After a (rather long) while, a table with top markers as well as a heatmap corresponding to it appears (Figure 43).

Figure  42: Request top marker genes

Figure 42: Request top marker genes

Figure  43: Top marker genes result

Figure 43: Top marker genes result

To increase the number of markers displayed in the table and on the heatmap, move the ruler above the table. The two outputs will be updated (Figure 44).

Figure  44: Update the number of marker genes displayed

Figure 44: Update the number of marker genes displayed

You can download the marker table, use the ‘Download table’ button (Figure 45).

Figure  45: Download cluster marker table

Figure 45: Download cluster marker table

In the “Marker Gene Visualization” tab, you may plot expression of selected genes, as long as they are expressed in at least 1 cell in the dataset. To select a gene, copy one of the top markers into the “GeneID” field in the box and click on ‘Select genes’ (Figure 46).

Figure  46: Select gene IDs for visualization

Figure 46: Select gene IDs for visualization

Check that the gene(s) is(are) expressed in the ‘Genes used’ field (Figure 46).

Modify plot title and expression scale if needed, and click on ‘Plot tsne map’ to visualise gene expression for that gene(s) (Figure 47).

Figure  47: Tsne map with marker gene expression

Figure 47: Tsne map with marker gene expression

In the “Correlation Analyses” tab, you may query your dataset for the genes most correlated to your genes of interest and obtain pairwise gene expression plot. Again, enter a gene ID in the side box and click on “Select genes” button in this tab (Figure 48).

Figure  48: Select gene IDs for correlation analysis

Figure 48: Select gene IDs for correlation analysis

A violin plot of the pearson correlation calculated for log2-transformed counts will appear, alongside a list of top10 genes with the highest absolute correlation to the selected genes (Figure 49).

Figure  49: Display top correlated genes

Figure 49: Display top correlated genes

To plot pairwise correlation for selected genes, enter gene IDs into the boxes collecting information for X and Y axes in the bottom half of the page, adjust the plot title if necessary, and click on the “Plot expression” button (Figure 50).

Figure  50: Select gene IDs for pairwise expression plot

Figure 50: Select gene IDs for pairwise expression plot

Pairwise plot of normalized counts will appear (Figure 51).

Figure  51: Pairwise expression plot

Figure 51: Pairwise expression plot

General: Documenting your analysis

To keep trace of the parameters you used to generate your plots, it is recommended that you code them either into the plot titles (customizable by the user) or into the file names under which you save your plots.

To keep trace of the R and R packages versions, you might want to inspect the ‘sessionInfo’ tab. This contains the output of the sessionInfo() R command (Figure 52). At the bottom of the page, two buttons are available (Figure 53). Click on ‘Download session info’ or ‘Download your data’ to save the respective file on your computer.

Figure  52: Session Info tab

Figure 52: Session Info tab

Figure  53: Download documentation and modified dataset

Figure 53: Download documentation and modified dataset

Lastly, the code behind the app can be retrieved under “https://github.com/maxplanck-ie/scRNAseq_shiny_app” for the given version of the app. The latter you can read at the bottom of the side bar (Figure 54).

Figure  54: App version

Figure 54: App version